fix[next-dace]: Fix Memory Layout for CPU#2459
Merged
philip-paul-mueller merged 11 commits intoGridTools:mainfrom Feb 4, 2026
Merged
fix[next-dace]: Fix Memory Layout for CPU#2459philip-paul-mueller merged 11 commits intoGridTools:mainfrom
philip-paul-mueller merged 11 commits intoGridTools:mainfrom
Conversation
havogt
reviewed
Jan 28, 2026
| unit_strides_kind = ( | ||
| gtx_common.DimensionKind.HORIZONTAL if gpu else gtx_common.DimensionKind.VERTICAL | ||
| ) | ||
| unit_strides_kind = gtx_common.DimensionKind.HORIZONTAL |
Contributor
There was a problem hiding this comment.
Why does that make sense? You cannot assume anything...
Contributor
There was a problem hiding this comment.
Or is that just for transients? Then I would change the comment assume -> set or something.
Contributor
Author
There was a problem hiding this comment.
There are two things here, first the name is bad and should be probably something else.
However the value selection is correct, one could even argue that it is probably the only one that make sense.
The reason for this is that the maximal numbers of blocks is different for each direction, because (for ICON) size(horizontal) >>> size(vertical) one would get launch errors otherwise.
…escription. If the leading kind is not known then it will not reorder strides nor the iteration order. However, for cetain reasons (launch errors) we have to set one for GPU in that case.
edopao
reviewed
Jan 29, 2026
Contributor
edopao
left a comment
There was a problem hiding this comment.
LGTM, only one refactoring suggestion.
src/gt4py/next/program_processors/runners/dace/transformations/auto_optimize.py
Show resolved
Hide resolved
Contributor
Author
2 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.

Before the optimizer was assuming that the memory allocation for GPU and CPU was different, i.e. that in CPU the stride 1 dimension is associated with the vertical dimension while for GPU it is associated with the horizontal dimension. However, this is wrong and in both cases stride 1 is associated with the horizontal dimension.
This PR fixes this and now the loop order and the memory layout for transients assumes that stride 1 is associated to the horizontal dimension.
Note that the current implementation assumes that there is only one horizontal dimension.